Primary exercises

  1. Manually created factor.
    In a study participants were asked whether their sport activity is none, oncePerWeek, severalPerWeek or daily.
    Build a proper factor for the responses below and store it in a variable w.
    Print the factor.
    Write the code to count the numbers of occurrences of each level and print the counts.
severalPerWeek, none, none, oncePerWeek, oncePerWeek, oncePerWeek, oncePerWeek, ?, none, none
v <- c( "severalPerWeek", "none", "none", "oncePerWeek", "oncePerWeek", "oncePerWeek", "oncePerWeek", NA, "none", "none" )
w <- factor( v, levels = c( "none", "oncePerWeek", "severalPerWeek", "daily" ) )
w
 [1] severalPerWeek none           none           oncePerWeek    oncePerWeek    oncePerWeek    oncePerWeek    <NA>          
 [9] none           none          
Levels: none oncePerWeek severalPerWeek daily
fct_count( w )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
  1. A factor with a random content.
    Read help about the function sample.
    Then study and try the following lines of code to understand the results.
    Next, understand why an error is generated and use replace argument to generate a vector with 100 samples.
    Store this vector in a variable v and build a factor w from it.
    Finally, count the numbers of occurrences of each level in w.
    Ensure, that levels are in order provided in the variable lvl.
lvs <- c( "none", "oncePerWeek", "severalPerWeek", "daily" )
sample( lvs, 3 )
[1] "none"           "oncePerWeek"    "severalPerWeek"
sample( lvs, 3 )
[1] "daily"          "none"           "severalPerWeek"
sample( lvs, 3 )
[1] "daily"          "none"           "severalPerWeek"
sample( lvs, 100 )
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
v <- sample( lvs, 100, replace = TRUE )
w <- factor( v, levels = lvs )
w
  [1] none           oncePerWeek    oncePerWeek    none           severalPerWeek none           daily          daily         
  [9] oncePerWeek    daily          none           daily          severalPerWeek none           oncePerWeek    oncePerWeek   
 [17] severalPerWeek severalPerWeek none           oncePerWeek    none           oncePerWeek    severalPerWeek daily         
 [25] oncePerWeek    oncePerWeek    none           daily          severalPerWeek none           daily          daily         
 [33] daily          daily          oncePerWeek    severalPerWeek none           none           oncePerWeek    none          
 [41] none           severalPerWeek severalPerWeek oncePerWeek    severalPerWeek none           oncePerWeek    daily         
 [49] oncePerWeek    severalPerWeek none           none           oncePerWeek    oncePerWeek    oncePerWeek    oncePerWeek   
 [57] severalPerWeek severalPerWeek none           daily          daily          oncePerWeek    severalPerWeek oncePerWeek   
 [65] severalPerWeek oncePerWeek    oncePerWeek    oncePerWeek    daily          none           daily          oncePerWeek   
 [73] oncePerWeek    none           none           severalPerWeek oncePerWeek    severalPerWeek daily          severalPerWeek
 [81] none           none           none           oncePerWeek    oncePerWeek    none           daily          daily         
 [89] severalPerWeek severalPerWeek oncePerWeek    oncePerWeek    none           none           severalPerWeek none          
 [97] daily          severalPerWeek daily          daily         
Levels: none oncePerWeek severalPerWeek daily
fct_count( w )
# A tibble: 4 × 2
  f                  n
  <fct>          <int>
1 none              27
2 oncePerWeek       30
3 severalPerWeek    22
4 daily             21
  1. Reordering factor levels.
    When a factor is shown on an axis of a plot, the order is given by its levels.
    The factor w from the previous exercise will be then shown in this order: none, oncePerWeek, severalPerWeek, daily.
    But for a picture in a manuscript the following order might be needed: daily, severalPerWeek, oncePerWeek, none.
    Apply to w one of the fct_ functions from the tidyverse library to produce a factor w2 with the requested order.
    Show the levels of w2.
    Again show the number of elements of each level in w2 and compare it with the table of the previous exercise.
w2 <- fct_relevel( w, c( "daily", "severalPerWeek", "oncePerWeek", "none" ) )
levels( w2 )
[1] "daily"          "severalPerWeek" "oncePerWeek"    "none"          
fct_count( w2 )
# A tibble: 4 × 2
  f                  n
  <fct>          <int>
1 daily             21
2 severalPerWeek    22
3 oncePerWeek       30
4 none              27

Extra exercises

  1. Counting with table(); getting counts for single levels.
    The fct_count() is a tidyverse/forcats function for counting factor elements and produces the result in a form of a table (the tibble object).
    The table() function from base-R provides a similar functionality but returns the result in another format.
    Reuse the factor w from the first primary exercise.
    Try table( w ) and compare its output with fct_count( w ).
    Store the counts as follows cnts <- table( w ). Use square brackets on cnts to get the count of oncePerWeek.
v <- c( "severalPerWeek", "none", "none", "oncePerWeek", "oncePerWeek", "oncePerWeek", "oncePerWeek", NA, "none", "none" )
w <- factor( v, levels = c( "none", "oncePerWeek", "severalPerWeek", "daily" ) )
w
 [1] severalPerWeek none           none           oncePerWeek    oncePerWeek    oncePerWeek    oncePerWeek    <NA>          
 [9] none           none          
Levels: none oncePerWeek severalPerWeek daily
table( w )
w
          none    oncePerWeek severalPerWeek          daily 
             4              4              1              0 
fct_count( w )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
cnts <- table( w )
cnts[ "oncePerWeek" ]
oncePerWeek 
          4 
  1. Special ordering of levels.
    ➡️Go to forcats cheat sheet to find how to order the factor by the frequency of occurrences.
    Reuse w from the previous exercise and construct a factor w3 with the same values and with the levels sorted by descending number of occurrences.
    Count the occurrences to demonstrate correctness.
    Now, find a way to sort the levels in the increasing order.
w3 <- fct_infreq( w )
fct_count( w3 )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
fct_count( fct_rev( w3 ) )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 daily              0
2 severalPerWeek     1
3 oncePerWeek        4
4 none               4
5 <NA>               1


Copyright © 2022 Biomedical Data Sciences (BDS) | LUMC